Overview

Dataset statistics

Number of variables9
Number of observations3132
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory220.3 KiB
Average record size in memory72.0 B

Variable types

NUM8
CAT1

Reproduction

Analysis started2020-08-24 21:34:15.038038
Analysis finished2020-08-24 21:34:31.760836
Duration16.72 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Diameter is highly correlated with Length and 2 other fieldsHigh correlation
Length is highly correlated with Diameter and 3 other fieldsHigh correlation
Whole weight is highly correlated with Length and 4 other fieldsHigh correlation
Shucked weight is highly correlated with Whole weight and 1 other fieldsHigh correlation
Viscera weight is highly correlated with Length and 3 other fieldsHigh correlation
Shell weight is highly correlated with Length and 3 other fieldsHigh correlation

Variables

Sex
Categorical

Distinct count3
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size24.5 KiB
M
1138
I
1013
F
981
ValueCountFrequency (%) 
M113836.3%
 
I101332.3%
 
F98131.3%
 

Length

Max length1
Median length1
Mean length1
Min length1

Length
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count133
Unique (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5237068965517241
Minimum0.11
Maximum0.815
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.11
5-th percentile0.295
Q10.45
median0.545
Q30.615
95-th percentile0.69
Maximum0.815
Range0.705
Interquartile range (IQR)0.165

Descriptive statistics

Standard deviation0.1198914431
Coefficient of variation (CV)0.2289285169
Kurtosis0.04474549084
Mean0.5237068966
Median Absolute Deviation (MAD)0.08
Skewness-0.636597264
Sum1640.25
Variance0.01437395814
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.55722.3%
 
0.625712.3%
 
0.62682.2%
 
0.575682.2%
 
0.58652.1%
 
0.6642.0%
 
0.5632.0%
 
0.52581.9%
 
0.57571.8%
 
0.59561.8%
 
Other values (123)249079.5%
 
ValueCountFrequency (%) 
0.111< 0.1%
 
0.1320.1%
 
0.1351< 0.1%
 
0.1420.1%
 
0.151< 0.1%
 
ValueCountFrequency (%) 
0.8151< 0.1%
 
0.81< 0.1%
 
0.7820.1%
 
0.7751< 0.1%
 
0.7720.1%
 

Diameter
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count110
Unique (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4079517879948914
Minimum0.09
Maximum0.65
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.09
5-th percentile0.22
Q10.35
median0.425
Q30.48
95-th percentile0.545
Maximum0.65
Range0.56
Interquartile range (IQR)0.13

Descriptive statistics

Standard deviation0.09933632486
Coefficient of variation (CV)0.2435001581
Kurtosis-0.07025117819
Mean0.407951788
Median Absolute Deviation (MAD)0.065
Skewness-0.6038878442
Sum1277.705
Variance0.009867705436
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.451023.3%
 
0.475963.1%
 
0.5862.7%
 
0.4822.6%
 
0.47712.3%
 
0.48712.3%
 
0.465702.2%
 
0.375692.2%
 
0.455672.1%
 
0.46662.1%
 
Other values (100)235275.1%
 
ValueCountFrequency (%) 
0.091< 0.1%
 
0.0951< 0.1%
 
0.120.1%
 
0.10520.1%
 
0.1120.1%
 
ValueCountFrequency (%) 
0.651< 0.1%
 
0.6330.1%
 
0.6251< 0.1%
 
0.621< 0.1%
 
0.6151< 0.1%
 

Height
Real number (ℝ≥0)

Distinct count49
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.13927681992337165
Minimum0.0
Maximum0.515
Zeros1
Zeros (%)< 0.1%
Memory size24.5 KiB

Quantile statistics

Minimum0
5-th percentile0.075
Q10.115
median0.14
Q30.165
95-th percentile0.2
Maximum0.515
Range0.515
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.03899113805
Coefficient of variation (CV)0.2799542527
Kurtosis2.449882236
Mean0.1392768199
Median Absolute Deviation (MAD)0.025
Skewness0.03735663857
Sum436.215
Variance0.001520308846
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.152006.4%
 
0.1551695.4%
 
0.141675.3%
 
0.1751665.3%
 
0.161514.8%
 
0.1251494.8%
 
0.1651474.7%
 
0.1351464.7%
 
0.121364.3%
 
0.1451324.2%
 
Other values (39)156950.1%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0151< 0.1%
 
0.021< 0.1%
 
0.02540.1%
 
0.0360.2%
 
ValueCountFrequency (%) 
0.5151< 0.1%
 
0.2530.1%
 
0.2430.1%
 
0.23540.1%
 
0.2390.3%
 

Whole weight
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2037
Unique (%)65.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.828670019157088
Minimum0.008
Maximum2.8255
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.008
5-th percentile0.1275
Q10.4415
median0.7995
Q31.153
95-th percentile1.70095
Maximum2.8255
Range2.8175
Interquartile range (IQR)0.7115

Descriptive statistics

Standard deviation0.4906301983
Coefficient of variation (CV)0.5920694449
Kurtosis0.03429973195
Mean0.8286700192
Median Absolute Deviation (MAD)0.35575
Skewness0.5466790619
Sum2595.3945
Variance0.2407179914
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.222580.3%
 
0.9760.2%
 
0.49460.2%
 
0.1860.2%
 
0.71750.2%
 
0.19750.2%
 
0.19650.2%
 
0.477550.2%
 
0.87450.2%
 
0.48750.2%
 
Other values (2027)307698.2%
 
ValueCountFrequency (%) 
0.0081< 0.1%
 
0.01051< 0.1%
 
0.0131< 0.1%
 
0.0141< 0.1%
 
0.01451< 0.1%
 
ValueCountFrequency (%) 
2.82551< 0.1%
 
2.77951< 0.1%
 
2.6571< 0.1%
 
2.551< 0.1%
 
2.5481< 0.1%
 

Shucked weight
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1390
Unique (%)44.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.35922860791826305
Minimum0.0025
Maximum1.4880000000000002
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.0025
5-th percentile0.0545
Q10.185375
median0.3355
Q30.4995
95-th percentile0.741225
Maximum1.488
Range1.4855
Interquartile range (IQR)0.314125

Descriptive statistics

Standard deviation0.2219457029
Coefficient of variation (CV)0.6178397209
Kurtosis0.601872859
Mean0.3592286079
Median Absolute Deviation (MAD)0.1575
Skewness0.7237958254
Sum1125.104
Variance0.04925989502
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.17590.3%
 
0.294580.3%
 
0.16580.3%
 
0.250580.3%
 
0.238580.3%
 
0.09670.2%
 
0.041570.2%
 
0.067570.2%
 
0.30270.2%
 
0.09770.2%
 
Other values (1380)305697.6%
 
ValueCountFrequency (%) 
0.00251< 0.1%
 
0.004520.1%
 
0.00520.1%
 
0.00551< 0.1%
 
0.006520.1%
 
ValueCountFrequency (%) 
1.4881< 0.1%
 
1.3511< 0.1%
 
1.34851< 0.1%
 
1.23951< 0.1%
 
1.2321< 0.1%
 

Viscera weight
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count843
Unique (%)26.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.18073084291187738
Minimum0.0005
Maximum0.76
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.0005
5-th percentile0.028275
Q10.093375
median0.17
Q30.2525
95-th percentile0.38
Maximum0.76
Range0.7595
Interquartile range (IQR)0.159125

Descriptive statistics

Standard deviation0.1099235988
Coefficient of variation (CV)0.6082171534
Kurtosis0.1636080714
Mean0.1807308429
Median Absolute Deviation (MAD)0.07925
Skewness0.613185568
Sum566.049
Variance0.01208319757
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.156120.4%
 
0.15120.4%
 
0.037110.4%
 
0.1715110.4%
 
0.2145110.4%
 
0.1415100.3%
 
0.246100.3%
 
0.1315100.3%
 
0.1625100.3%
 
0.1405100.3%
 
Other values (833)302596.6%
 
ValueCountFrequency (%) 
0.00051< 0.1%
 
0.0021< 0.1%
 
0.002520.1%
 
0.0031< 0.1%
 
0.003520.1%
 
ValueCountFrequency (%) 
0.761< 0.1%
 
0.64151< 0.1%
 
0.591< 0.1%
 
0.5751< 0.1%
 
0.57451< 0.1%
 

Shell weight
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count830
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.23915900383141764
Minimum0.003
Maximum1.005
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum0.003
5-th percentile0.03955
Q10.13
median0.235
Q30.33
95-th percentile0.48
Maximum1.005
Range1.002
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.1390404721
Coefficient of variation (CV)0.5813725175
Kurtosis0.591303839
Mean0.2391590038
Median Absolute Deviation (MAD)0.1
Skewness0.6192253654
Sum749.046
Variance0.0193322529
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.265341.1%
 
0.285341.1%
 
0.185331.1%
 
0.275331.1%
 
0.25311.0%
 
0.335311.0%
 
0.315301.0%
 
0.17301.0%
 
0.235290.9%
 
0.22280.9%
 
Other values (820)281990.0%
 
ValueCountFrequency (%) 
0.0031< 0.1%
 
0.00351< 0.1%
 
0.00420.1%
 
0.00570.2%
 
0.00651< 0.1%
 
ValueCountFrequency (%) 
1.0051< 0.1%
 
0.8971< 0.1%
 
0.8851< 0.1%
 
0.851< 0.1%
 
0.8151< 0.1%
 

Rings
Real number (ℝ≥0)

Distinct count26
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.947956577266922
Minimum2
Maximum29
Zeros0
Zeros (%)0.0%
Memory size24.5 KiB

Quantile statistics

Minimum2
5-th percentile6
Q18
median10
Q311
95-th percentile16
Maximum29
Range27
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.230252485
Coefficient of variation (CV)0.3247151774
Kurtosis2.350069229
Mean9.947956577
Median Absolute Deviation (MAD)2
Skewness1.131830046
Sum31157
Variance10.43453112
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
951316.4%
 
1048715.5%
 
842313.5%
 
1135111.2%
 
73049.7%
 
122106.7%
 
61815.8%
 
131615.1%
 
5892.8%
 
14872.8%
 
Other values (16)32610.4%
 
ValueCountFrequency (%) 
21< 0.1%
 
3110.4%
 
4421.3%
 
5892.8%
 
61815.8%
 
ValueCountFrequency (%) 
291< 0.1%
 
271< 0.1%
 
261< 0.1%
 
251< 0.1%
 
2370.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
0F0.5250.4000.1350.71400.31800.13800.208010
1F0.4450.3250.1250.45500.17850.11250.14009
2M0.7100.5400.1651.95900.76650.26100.780018
3F0.6800.5800.2001.78700.58500.45300.600019
4M0.6050.4700.1601.17350.49750.24050.345012
5I0.5550.4300.1250.70050.33950.13550.20958
6M0.5100.4050.1250.69250.32700.15500.18057
7M0.6550.5200.1801.49200.71850.36000.355011
8M0.5400.4150.1450.74000.26350.16800.245012
9M0.5150.4100.1400.73550.30650.13700.20007

Last rows

SexLengthDiameterHeightWhole weightShucked weightViscera weightShell weightRings
3122M0.6250.4750.1601.08450.50050.23550.310510
3123M0.6600.5150.1551.44150.70550.35550.335010
3124I0.5000.3750.1200.54200.21500.11600.17009
3125F0.6200.4850.1651.16600.48300.23800.355013
3126M0.4550.3450.1500.57950.16850.12500.215013
3127M0.4150.3150.1200.40150.19900.08700.09708
3128I0.3250.2400.0700.15200.05650.03050.05408
3129M0.5650.4550.1550.93550.42100.18300.260011
3130M0.6100.4850.1451.33050.78300.22550.28659
3131F0.6550.5050.1901.34850.59350.27450.425012